Dataset statistics
| Number of variables | 14 |
|---|---|
| Number of observations | 1458637 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 155.8 MiB |
| Average record size in memory | 112.0 B |
Variable types
| NUM | 12 |
|---|---|
| BOOL | 1 |
| CAT | 1 |
Reproduction
| Analysis started | 2021-12-20 17:38:40.682138 |
|---|---|
| Analysis finished | 2021-12-20 17:41:36.058434 |
| Duration | 2 minutes and 55.38 seconds |
| Version | pandas-profiling v2.7.1 |
| Command line | pandas_profiling --config_file config.yaml [YOUR_FILE.csv] |
| Download configuration | config.yaml |
pickup_longitude is highly skewed (γ1 = -418.1221795) | Skewed |
dropoff_longitude is highly skewed (γ1 = -425.3309936) | Skewed |
dropoff_latitude is highly skewed (γ1 = -20.67128816) | Skewed |
trip_duration is highly skewed (γ1 = 343.2504643) | Skewed |
df_index is uniformly distributed | Uniform |
df_index has unique values | Unique |
hour has 53248 (3.7%) zeros | Zeros |
minute has 23788 (1.6%) zeros | Zeros |
second has 24290 (1.7%) zeros | Zeros |
| Distinct count | 1458637 |
|---|---|
| Unique (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 729319.5052339958 |
|---|---|
| Minimum | 0 |
| Maximum | 1458643 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Memory size | 11.1 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 72931.8 |
| Q1 | 364659 |
| median | 729319 |
| Q3 | 1093980 |
| 95-th percentile | 1385709.2 |
| Maximum | 1458643 |
| Range | 1458643 |
| Interquartile range (IQR) | 729321 |
Descriptive statistics
| Standard deviation | 421074.0269 |
|---|---|
| Coefficient of variation (CV) | 0.577351934 |
| Kurtosis | -1.199999715 |
| Mean | 729319.5052 |
| Median Absolute Deviation (MAD) | 364661 |
| Skewness | 4.526536029e-06 |
| Sum | 1.063812415e+12 |
| Variance | 1.773033361e+11 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 2047 | 1 | < 0.1% | |
| 1457042 | 1 | < 0.1% | |
| 1170286 | 1 | < 0.1% | |
| 1168239 | 1 | < 0.1% | |
| 1125232 | 1 | < 0.1% | |
| 1123185 | 1 | < 0.1% | |
| 1129330 | 1 | < 0.1% | |
| 1127283 | 1 | < 0.1% | |
| 1117044 | 1 | < 0.1% | |
| 1114997 | 1 | < 0.1% | |
| Other values (1458627) | 1458627 | > 99.9% |
| Value | Count | Frequency (%) | |
| 0 | 1 | < 0.1% | |
| 1 | 1 | < 0.1% | |
| 2 | 1 | < 0.1% | |
| 3 | 1 | < 0.1% | |
| 4 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 1458643 | 1 | < 0.1% | |
| 1458642 | 1 | < 0.1% | |
| 1458641 | 1 | < 0.1% | |
| 1458640 | 1 | < 0.1% | |
| 1458639 | 1 | < 0.1% |
vendor_id
Categorical
| Distinct count | 2 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 11.1 MiB |
| 2 | |
|---|---|
| 1 |
| Value | Count | Frequency (%) | |
| 2 | 780295 | 53.5% | |
| 1 | 678342 | 46.5% |
Length
| Max length | 1 |
|---|---|
| Mean length | 1 |
| Min length | 1 |
| Value | Count | Frequency (%) | |
| Decimal_Number | 2 | 100.0% |
| Value | Count | Frequency (%) | |
| Common | 2 | 100.0% |
| Value | Count | Frequency (%) | |
| ASCII | 2 | 100.0% |
passenger_count
Real number (ℝ≥0)
| Distinct count | 10 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.6645251697303716 |
|---|---|
| Minimum | 0 |
| Maximum | 9 |
| Zeros | 60 |
| Zeros (%) | < 0.1% |
| Memory size | 11.1 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 2 |
| 95-th percentile | 5 |
| Maximum | 9 |
| Range | 9 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1.314241808 |
|---|---|
| Coefficient of variation (CV) | 0.7895595882 |
| Kurtosis | 3.431822493 |
| Mean | 1.66452517 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 2.128107148 |
| Sum | 2427938 |
| Variance | 1.727231529 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 1 | 1033539 | 70.9% | |
| 2 | 210315 | 14.4% | |
| 5 | 78088 | 5.4% | |
| 3 | 59895 | 4.1% | |
| 6 | 48333 | 3.3% | |
| 4 | 28402 | 1.9% | |
| 0 | 60 | < 0.1% | |
| 7 | 3 | < 0.1% | |
| 9 | 1 | < 0.1% | |
| 8 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 0 | 60 | < 0.1% | |
| 1 | 1033539 | 70.9% | |
| 2 | 210315 | 14.4% | |
| 3 | 59895 | 4.1% | |
| 4 | 28402 | 1.9% |
| Value | Count | Frequency (%) | |
| 9 | 1 | < 0.1% | |
| 8 | 1 | < 0.1% | |
| 7 | 3 | < 0.1% | |
| 6 | 48333 | 3.3% | |
| 5 | 78088 | 5.4% |
| Distinct count | 23047 |
|---|---|
| Unique (%) | 1.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -73.97348638725 |
|---|---|
| Minimum | -121.93334197998048 |
| Maximum | -61.33552932739258 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 11.1 MiB |
Quantile statistics
| Minimum | -121.933342 |
|---|---|
| 5-th percentile | -74.00686646 |
| Q1 | -73.99186707 |
| median | -73.98174286 |
| Q3 | -73.96733093 |
| 95-th percentile | -73.89159698 |
| Maximum | -61.33552933 |
| Range | 60.59781265 |
| Interquartile range (IQR) | 0.02453613281 |
Descriptive statistics
| Standard deviation | 0.07090187074 |
|---|---|
| Coefficient of variation (CV) | -0.000958476803 |
| Kurtosis | 288157.7011 |
| Mean | -73.97348639 |
| Median Absolute Deviation (MAD) | 0.01177978516 |
| Skewness | -418.1221795 |
| Sum | -107900464.3 |
| Variance | 0.005027075274 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| -73.98220062 | 633 | < 0.1% | |
| -73.98213959 | 607 | < 0.1% | |
| -73.98210144 | 587 | < 0.1% | |
| -73.9821167 | 585 | < 0.1% | |
| -73.98222351 | 584 | < 0.1% | |
| -73.98209381 | 575 | < 0.1% | |
| -73.9822464 | 558 | < 0.1% | |
| -73.98220825 | 551 | < 0.1% | |
| -73.98230743 | 546 | < 0.1% | |
| -73.98217773 | 545 | < 0.1% | |
| Other values (23037) | 1452866 | 99.6% |
| Value | Count | Frequency (%) | |
| -121.933342 | 1 | < 0.1% | |
| -121.9332352 | 1 | < 0.1% | |
| -79.56973267 | 1 | < 0.1% | |
| -79.48789978 | 1 | < 0.1% | |
| -78.54740143 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| -61.33552933 | 1 | < 0.1% | |
| -65.84838867 | 1 | < 0.1% | |
| -65.89738464 | 1 | < 0.1% | |
| -66.97216034 | 1 | < 0.1% | |
| -68.77843475 | 1 | < 0.1% |
pickup_latitude
Real number (ℝ≥0)
| Distinct count | 45245 |
|---|---|
| Unique (%) | 3.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 40.7509210022788 |
|---|---|
| Minimum | 34.359695434570305 |
| Maximum | 51.88108444213867 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 11.1 MiB |
Quantile statistics
| Minimum | 34.35969543 |
|---|---|
| 5-th percentile | 40.70814133 |
| Q1 | 40.73734665 |
| median | 40.7541008 |
| Q3 | 40.76836014 |
| 95-th percentile | 40.7883873 |
| Maximum | 51.88108444 |
| Range | 17.52138901 |
| Interquartile range (IQR) | 0.03101348877 |
Descriptive statistics
| Standard deviation | 0.03288106839 |
|---|---|
| Coefficient of variation (CV) | 0.0008068791474 |
| Kurtosis | 12950.48929 |
| Mean | 40.750921 |
| Median Absolute Deviation (MAD) | 0.01531982422 |
| Skewness | 5.489227968 |
| Sum | 59440801.16 |
| Variance | 0.001081164659 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 40.77410126 | 414 | < 0.1% | |
| 40.77408981 | 411 | < 0.1% | |
| 40.77412033 | 410 | < 0.1% | |
| 40.77410889 | 392 | < 0.1% | |
| 40.77407837 | 390 | < 0.1% | |
| 40.77405167 | 376 | < 0.1% | |
| 40.77413177 | 356 | < 0.1% | |
| 40.7741394 | 352 | < 0.1% | |
| 40.77407074 | 347 | < 0.1% | |
| 40.77415848 | 335 | < 0.1% | |
| Other values (45235) | 1454854 | 99.7% |
| Value | Count | Frequency (%) | |
| 34.35969543 | 1 | < 0.1% | |
| 34.7122345 | 1 | < 0.1% | |
| 35.08153152 | 1 | < 0.1% | |
| 35.31030655 | 1 | < 0.1% | |
| 36.02930069 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 51.88108444 | 1 | < 0.1% | |
| 44.37194443 | 1 | < 0.1% | |
| 43.91176224 | 1 | < 0.1% | |
| 43.48688507 | 1 | < 0.1% | |
| 43.13965225 | 1 | < 0.1% |
| Distinct count | 33821 |
|---|---|
| Unique (%) | 2.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -73.97341590774337 |
|---|---|
| Minimum | -121.9333038330078 |
| Maximum | -61.33552932739258 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 11.1 MiB |
Quantile statistics
| Minimum | -121.9333038 |
|---|---|
| 5-th percentile | -74.00753021 |
| Q1 | -73.99132538 |
| median | -73.97975159 |
| Q3 | -73.9630127 |
| 95-th percentile | -73.92017975 |
| Maximum | -61.33552933 |
| Range | 60.59777451 |
| Interquartile range (IQR) | 0.02831268311 |
Descriptive statistics
| Standard deviation | 0.07064342162 |
|---|---|
| Coefficient of variation (CV) | -0.0009549839054 |
| Kurtosis | 292524.8837 |
| Mean | -73.97341591 |
| Median Absolute Deviation (MAD) | 0.01317596436 |
| Skewness | -425.3309936 |
| Sum | -107900361.5 |
| Variance | 0.004990493018 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| -73.98233032 | 443 | < 0.1% | |
| -73.98209381 | 433 | < 0.1% | |
| -73.9822464 | 430 | < 0.1% | |
| -73.9821167 | 427 | < 0.1% | |
| -73.99137878 | 420 | < 0.1% | |
| -73.98220062 | 419 | < 0.1% | |
| -73.98226929 | 414 | < 0.1% | |
| -73.99140167 | 406 | < 0.1% | |
| -73.98238373 | 405 | < 0.1% | |
| -73.98230743 | 403 | < 0.1% | |
| Other values (33811) | 1454437 | 99.7% |
| Value | Count | Frequency (%) | |
| -121.9333038 | 1 | < 0.1% | |
| -121.9332047 | 1 | < 0.1% | |
| -80.3554306 | 1 | < 0.1% | |
| -79.81797791 | 1 | < 0.1% | |
| -79.78613281 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| -61.33552933 | 1 | < 0.1% | |
| -65.84838867 | 1 | < 0.1% | |
| -65.89738464 | 1 | < 0.1% | |
| -68.77843475 | 1 | < 0.1% | |
| -69.04801941 | 1 | < 0.1% |
| Distinct count | 62519 |
|---|---|
| Unique (%) | 4.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 40.75179953243315 |
|---|---|
| Minimum | 32.1811408996582 |
| Maximum | 43.92102813720703 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 11.1 MiB |
Quantile statistics
| Minimum | 32.1811409 |
|---|---|
| 5-th percentile | 40.69992065 |
| Q1 | 40.73588562 |
| median | 40.75452423 |
| Q3 | 40.76980972 |
| 95-th percentile | 40.79750824 |
| Maximum | 43.92102814 |
| Range | 11.73988724 |
| Interquartile range (IQR) | 0.03392410278 |
Descriptive statistics
| Standard deviation | 0.03589055387 |
|---|---|
| Coefficient of variation (CV) | 0.0008807108958 |
| Kurtosis | 4259.563861 |
| Mean | 40.75179953 |
| Median Absolute Deviation (MAD) | 0.01673126221 |
| Skewness | -20.67128816 |
| Sum | 59442082.61 |
| Variance | 0.001288131857 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 40.77431107 | 269 | < 0.1% | |
| 40.77433014 | 263 | < 0.1% | |
| 40.75014877 | 259 | < 0.1% | |
| 40.75011826 | 253 | < 0.1% | |
| 40.75019836 | 250 | < 0.1% | |
| 40.75017166 | 247 | < 0.1% | |
| 40.7743187 | 245 | < 0.1% | |
| 40.77434158 | 244 | < 0.1% | |
| 40.75003815 | 242 | < 0.1% | |
| 40.75011063 | 242 | < 0.1% | |
| Other values (62509) | 1456123 | 99.8% |
| Value | Count | Frequency (%) | |
| 32.1811409 | 1 | < 0.1% | |
| 34.35969543 | 1 | < 0.1% | |
| 35.17354584 | 1 | < 0.1% | |
| 36.02930069 | 1 | < 0.1% | |
| 36.1185379 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 43.92102814 | 1 | < 0.1% | |
| 43.91176224 | 1 | < 0.1% | |
| 43.67399979 | 1 | < 0.1% | |
| 43.48688507 | 1 | < 0.1% | |
| 43.14758301 | 1 | < 0.1% |
store_and_fwd_flag
Boolean
| Distinct count | 2 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 11.1 MiB |
| 0 | |
|---|---|
| 1 | 8045 |
| Value | Count | Frequency (%) | |
| 0 | 1450592 | 99.4% | |
| 1 | 8045 | 0.6% |
| Distinct count | 458 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 15.499838547904654 |
|---|---|
| Minimum | 0 |
| Maximum | 58771 |
| Zeros | 8594 |
| Zeros (%) | 0.6% |
| Memory size | 11.1 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 3 |
| Q1 | 6 |
| median | 11 |
| Q3 | 17 |
| 95-th percentile | 35 |
| Maximum | 58771 |
| Range | 58771 |
| Interquartile range (IQR) | 11 |
Descriptive statistics
| Standard deviation | 87.28289213 |
|---|---|
| Coefficient of variation (CV) | 5.631212987 |
| Kurtosis | 192198.9632 |
| Mean | 15.49983855 |
| Median Absolute Deviation (MAD) | 5 |
| Skewness | 343.2504643 |
| Sum | 22608638 |
| Variance | 7618.303258 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 6 | 89941 | 6.2% | |
| 7 | 88516 | 6.1% | |
| 5 | 88001 | 6.0% | |
| 8 | 84549 | 5.8% | |
| 4 | 81674 | 5.6% | |
| 9 | 80107 | 5.5% | |
| 10 | 74182 | 5.1% | |
| 11 | 68801 | 4.7% | |
| 3 | 66807 | 4.6% | |
| 12 | 62969 | 4.3% | |
| Other values (448) | 673090 | 46.1% |
| Value | Count | Frequency (%) | |
| 0 | 8594 | 0.6% | |
| 1 | 19222 | 1.3% | |
| 2 | 44185 | 3.0% | |
| 3 | 66807 | 4.6% | |
| 4 | 81674 | 5.6% |
| Value | Count | Frequency (%) | |
| 58771 | 1 | < 0.1% | |
| 37126 | 1 | < 0.1% | |
| 34159 | 1 | < 0.1% | |
| 32328 | 1 | < 0.1% | |
| 1439 | 93 | < 0.1% |
month
Real number (ℝ≥0)
| Distinct count | 6 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.516818783563011 |
|---|---|
| Minimum | 1 |
| Maximum | 6 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 11.1 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 2 |
| median | 4 |
| Q3 | 5 |
| 95-th percentile | 6 |
| Maximum | 6 |
| Range | 5 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 1.681037912 |
|---|---|
| Coefficient of variation (CV) | 0.4779995831 |
| Kurtosis | -1.229607444 |
| Mean | 3.516818784 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | -0.0147429711 |
| Sum | 5129762 |
| Variance | 2.825888462 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 3 | 256186 | 17.6% | |
| 4 | 251645 | 17.3% | |
| 5 | 248486 | 17.0% | |
| 2 | 238299 | 16.3% | |
| 6 | 234315 | 16.1% | |
| 1 | 229706 | 15.7% |
| Value | Count | Frequency (%) | |
| 1 | 229706 | 15.7% | |
| 2 | 238299 | 16.3% | |
| 3 | 256186 | 17.6% | |
| 4 | 251645 | 17.3% | |
| 5 | 248486 | 17.0% |
| Value | Count | Frequency (%) | |
| 6 | 234315 | 16.1% | |
| 5 | 248486 | 17.0% | |
| 4 | 251645 | 17.3% | |
| 3 | 256186 | 17.6% | |
| 2 | 238299 | 16.3% |
day
Real number (ℝ≥0)
| Distinct count | 31 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 15.504042472527434 |
|---|---|
| Minimum | 1 |
| Maximum | 31 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 11.1 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 8 |
| median | 15 |
| Q3 | 23 |
| 95-th percentile | 29 |
| Maximum | 31 |
| Range | 30 |
| Interquartile range (IQR) | 15 |
Descriptive statistics
| Standard deviation | 8.703139533 |
|---|---|
| Coefficient of variation (CV) | 0.5613464713 |
| Kurtosis | -1.172200725 |
| Mean | 15.50404247 |
| Median Absolute Deviation (MAD) | 7 |
| Skewness | 0.04041463097 |
| Sum | 22614770 |
| Variance | 75.74463774 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 16 | 51026 | 3.5% | |
| 14 | 50488 | 3.5% | |
| 5 | 50175 | 3.4% | |
| 12 | 50079 | 3.4% | |
| 15 | 49791 | 3.4% | |
| 4 | 49654 | 3.4% | |
| 9 | 49633 | 3.4% | |
| 6 | 49475 | 3.4% | |
| 13 | 49293 | 3.4% | |
| 19 | 49265 | 3.4% | |
| Other values (21) | 959758 | 65.8% |
| Value | Count | Frequency (%) | |
| 1 | 46612 | 3.2% | |
| 2 | 47752 | 3.3% | |
| 3 | 47945 | 3.3% | |
| 4 | 49654 | 3.4% | |
| 5 | 50175 | 3.4% |
| Value | Count | Frequency (%) | |
| 31 | 22988 | 1.6% | |
| 30 | 39135 | 2.7% | |
| 29 | 46807 | 3.2% | |
| 28 | 45891 | 3.1% | |
| 27 | 46957 | 3.2% |
| Distinct count | 24 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 13.606475771559339 |
|---|---|
| Minimum | 0 |
| Maximum | 23 |
| Zeros | 53248 |
| Zeros (%) | 3.7% |
| Memory size | 11.1 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 9 |
| median | 14 |
| Q3 | 19 |
| 95-th percentile | 22 |
| Maximum | 23 |
| Range | 23 |
| Interquartile range (IQR) | 10 |
Descriptive statistics
| Standard deviation | 6.399703161 |
|---|---|
| Coefficient of variation (CV) | 0.4703424508 |
| Kurtosis | -0.7217068139 |
| Mean | 13.60647577 |
| Median Absolute Deviation (MAD) | 5 |
| Skewness | -0.4445819039 |
| Sum | 19846909 |
| Variance | 40.95620055 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 18 | 90599 | 6.2% | |
| 19 | 90308 | 6.2% | |
| 21 | 84184 | 5.8% | |
| 20 | 84072 | 5.8% | |
| 22 | 80492 | 5.5% | |
| 17 | 76483 | 5.2% | |
| 14 | 74291 | 5.1% | |
| 12 | 71872 | 4.9% | |
| 15 | 71811 | 4.9% | |
| 13 | 71473 | 4.9% | |
| Other values (14) | 663052 | 45.5% |
| Value | Count | Frequency (%) | |
| 0 | 53248 | 3.7% | |
| 1 | 38571 | 2.6% | |
| 2 | 27972 | 1.9% | |
| 3 | 20895 | 1.4% | |
| 4 | 15792 | 1.1% |
| Value | Count | Frequency (%) | |
| 23 | 69785 | 4.8% | |
| 22 | 80492 | 5.5% | |
| 21 | 84184 | 5.8% | |
| 20 | 84072 | 5.8% | |
| 19 | 90308 | 6.2% |
| Distinct count | 60 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 29.590176308430404 |
|---|---|
| Minimum | 0 |
| Maximum | 59 |
| Zeros | 23788 |
| Zeros (%) | 1.6% |
| Memory size | 11.1 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 3 |
| Q1 | 15 |
| median | 30 |
| Q3 | 45 |
| 95-th percentile | 56 |
| Maximum | 59 |
| Range | 59 |
| Interquartile range (IQR) | 30 |
Descriptive statistics
| Standard deviation | 17.32471227 |
|---|---|
| Coefficient of variation (CV) | 0.5854886463 |
| Kurtosis | -1.207669989 |
| Mean | 29.59017631 |
| Median Absolute Deviation (MAD) | 15 |
| Skewness | -0.007614516428 |
| Sum | 43161326 |
| Variance | 300.1456552 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 48 | 25235 | 1.7% | |
| 45 | 25094 | 1.7% | |
| 50 | 25057 | 1.7% | |
| 54 | 24802 | 1.7% | |
| 52 | 24801 | 1.7% | |
| 42 | 24791 | 1.7% | |
| 49 | 24755 | 1.7% | |
| 46 | 24750 | 1.7% | |
| 47 | 24621 | 1.7% | |
| 44 | 24620 | 1.7% | |
| Other values (50) | 1210111 | 83.0% |
| Value | Count | Frequency (%) | |
| 0 | 23788 | 1.6% | |
| 1 | 23927 | 1.6% | |
| 2 | 24091 | 1.7% | |
| 3 | 24198 | 1.7% | |
| 4 | 24075 | 1.7% |
| Value | Count | Frequency (%) | |
| 59 | 23914 | 1.6% | |
| 58 | 23999 | 1.6% | |
| 57 | 24394 | 1.7% | |
| 56 | 24528 | 1.7% | |
| 55 | 24176 | 1.7% |
| Distinct count | 60 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 29.473553735439317 |
|---|---|
| Minimum | 0 |
| Maximum | 59 |
| Zeros | 24290 |
| Zeros (%) | 1.7% |
| Memory size | 11.1 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 3 |
| Q1 | 14 |
| median | 29 |
| Q3 | 44 |
| 95-th percentile | 57 |
| Maximum | 59 |
| Range | 59 |
| Interquartile range (IQR) | 30 |
Descriptive statistics
| Standard deviation | 17.31985024 |
|---|---|
| Coefficient of variation (CV) | 0.5876403774 |
| Kurtosis | -1.200135014 |
| Mean | 29.47355374 |
| Median Absolute Deviation (MAD) | 15 |
| Skewness | 0.002423214687 |
| Sum | 42991216 |
| Variance | 299.9772123 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 3 | 24684 | 1.7% | |
| 30 | 24582 | 1.7% | |
| 11 | 24548 | 1.7% | |
| 33 | 24548 | 1.7% | |
| 20 | 24541 | 1.7% | |
| 9 | 24540 | 1.7% | |
| 22 | 24529 | 1.7% | |
| 37 | 24515 | 1.7% | |
| 45 | 24504 | 1.7% | |
| 8 | 24487 | 1.7% | |
| Other values (50) | 1213159 | 83.2% |
| Value | Count | Frequency (%) | |
| 0 | 24290 | 1.7% | |
| 1 | 24241 | 1.7% | |
| 2 | 24286 | 1.7% | |
| 3 | 24684 | 1.7% | |
| 4 | 24290 | 1.7% |
| Value | Count | Frequency (%) | |
| 59 | 24280 | 1.7% | |
| 58 | 24325 | 1.7% | |
| 57 | 24347 | 1.7% | |
| 56 | 24273 | 1.7% | |
| 55 | 24332 | 1.7% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
| df_index | vendor_id | passenger_count | pickup_longitude | pickup_latitude | dropoff_longitude | dropoff_latitude | store_and_fwd_flag | trip_duration | month | day | hour | minute | second | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 2 | 1 | -73.982155 | 40.767937 | -73.964630 | 40.765602 | 0 | 7 | 3 | 14 | 17 | 24 | 55 |
| 1 | 1 | 1 | 1 | -73.980415 | 40.738564 | -73.999481 | 40.731152 | 0 | 11 | 6 | 12 | 0 | 43 | 35 |
| 2 | 2 | 2 | 1 | -73.979027 | 40.763939 | -74.005333 | 40.710087 | 0 | 35 | 1 | 19 | 11 | 35 | 24 |
| 3 | 3 | 2 | 1 | -74.010040 | 40.719971 | -74.012268 | 40.706718 | 0 | 7 | 4 | 6 | 19 | 32 | 31 |
| 4 | 4 | 2 | 1 | -73.973053 | 40.793209 | -73.972923 | 40.782520 | 0 | 7 | 3 | 26 | 13 | 30 | 55 |
| 5 | 5 | 2 | 6 | -73.982857 | 40.742195 | -73.992081 | 40.749184 | 0 | 7 | 1 | 30 | 22 | 1 | 40 |
| 6 | 6 | 1 | 4 | -73.969017 | 40.757839 | -73.957405 | 40.765896 | 0 | 5 | 6 | 17 | 22 | 34 | 59 |
| 7 | 7 | 2 | 1 | -73.969276 | 40.797779 | -73.922470 | 40.760559 | 0 | 25 | 5 | 21 | 7 | 54 | 58 |
| 8 | 8 | 1 | 1 | -73.999481 | 40.738400 | -73.985786 | 40.732815 | 0 | 4 | 5 | 27 | 23 | 12 | 23 |
| 9 | 9 | 2 | 1 | -73.981049 | 40.744339 | -73.973000 | 40.789989 | 0 | 20 | 3 | 10 | 21 | 45 | 1 |
Last rows
| df_index | vendor_id | passenger_count | pickup_longitude | pickup_latitude | dropoff_longitude | dropoff_latitude | store_and_fwd_flag | trip_duration | month | day | hour | minute | second | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1458627 | 1458634 | 1 | 2 | -73.989075 | 40.730465 | -73.963882 | 40.773739 | 0 | 16 | 4 | 3 | 13 | 51 | 25 |
| 1458628 | 1458635 | 2 | 1 | -73.985390 | 40.763020 | -73.989708 | 40.767502 | 0 | 3 | 5 | 19 | 14 | 46 | 55 |
| 1458629 | 1458636 | 2 | 1 | -73.863815 | 40.769684 | -73.864395 | 40.761326 | 0 | 13 | 2 | 12 | 10 | 13 | 6 |
| 1458630 | 1458637 | 1 | 1 | -73.975357 | 40.751705 | -73.949478 | 40.776764 | 0 | 12 | 4 | 17 | 18 | 48 | 16 |
| 1458631 | 1458638 | 2 | 5 | -73.988823 | 40.736553 | -73.989166 | 40.757393 | 0 | 6 | 2 | 2 | 0 | 39 | 39 |
| 1458632 | 1458639 | 2 | 4 | -73.982201 | 40.745522 | -73.994911 | 40.740170 | 0 | 12 | 4 | 8 | 13 | 31 | 4 |
| 1458633 | 1458640 | 1 | 1 | -74.000946 | 40.747379 | -73.970184 | 40.796547 | 0 | 10 | 1 | 10 | 7 | 35 | 15 |
| 1458634 | 1458641 | 2 | 1 | -73.959129 | 40.768799 | -74.004433 | 40.707371 | 0 | 12 | 4 | 22 | 6 | 57 | 41 |
| 1458635 | 1458642 | 1 | 1 | -73.982079 | 40.749062 | -73.974632 | 40.757107 | 0 | 6 | 1 | 5 | 15 | 56 | 26 |
| 1458636 | 1458643 | 1 | 1 | -73.979538 | 40.781750 | -73.972809 | 40.790585 | 0 | 3 | 4 | 5 | 14 | 44 | 25 |